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Abstract 

In this paper, two-state Markov switching models are proposed to study accident 
frequencies. These models assume that there are two unobserved states of road- 
way safety, and that roadway entities (roadway segments) can switch between these 
states over time. The states are distinct, in the sense that in the different states acci- 
dent frequencies are generated by separate counting processes (by separate Poisson 
or negative binomial processes). To demonstrate the applicability of the approach 
presented herein, two-state Markov switching negative binomial models are esti- 
mated using five-year accident frequencies on selected Indiana interstate highway 
segments. Bayesian inference methods and Markov Chain Monte Carlo (MCMC) 
simulations are used for model estimation. The estimated Markov switching mod- 
els result in a superior statistical fit relative to the standard (single-state) negative 
binomial model. It is found that the more frequent state is safer and it is correlated 
with better weather conditions. The less frequent state is found to be less safe and 
to be correlated with adverse weather conditions. 

Key words: Accident frequency; negative binomial; count data model; Markov 
switching; Bayesian; MCMC 



1 Introduction 



Vehicle accidents place an incredible social and economic burden on society. As 
a result, considerable research has been conducted on understanding and pre- 
dicting accident frequencies (the number of accidents occurring on roadway 
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segments over a given time period). Because accident frequencies are non- 
negativ e integers, count data mo dels are a reasonable statistical modeling ap- 
proach (IWashington et all 120031 ) . Simple mo deling approaches include Poisson 



models and negative binorn i al (NB) models (IHadi et al.l. Il995l : IShankar et al. 



1995 : Poch and Mannerine . 1996 : Miaou and Lord , 20031 ) . These models as- 
sume a single process for accident data generation (a Poisson process or 
a negative binomial process) and involve a nonlinear regression of the ob- 
served accident frequencies on various roadway-segment characteristics (such 
as roadway geometric and environmental factors). Because a pr eponderance o f 
zero-accident observa tions is often observed in empirical data, iMiaoul (Il994j ). 
Shankar et al.l (119971 ) and others have applied zero-inflated Poisson (ZIP) and 



zero-inflated negative binomial (ZINB) models for predicting accident frequen- 
cies. Zero-inflated models assume a two-state process for accident data gener- 
ation - one state is assumed to be safe with zero accidents (over the duration 
of time being considered) and the other state is assumed to be unsafe with a 
possibility of nonzero accident frequencies. In zero-inflated models, individual 
roadway segments are assumed to be always in the safe or unsafe state. While 
the application of zero-inflated models often provides a better statistical fit of 
observed acci dent frequency dat a, the applicability o f these models ha s been 
questioned bv lLord et all fl2005l . 2007). In particular. [Lord et al.l fl2005l . 2007) 
argue that it is unreasonable to expect some roadway segments to be always 
perfectly safe. In addition, they argue that zero-inflated models do not account 
for a likely possibility for roadway segments to change in time from one state 
to another. 



In this paper, two-state Markov switching count data models are explored as 
a method for studying accident frequencies. These models assume Markov 
switching (over time) between two unobserved states of roadway safetyPI 
There are a number of reasons to expect the existence of multiple states. 
First, the safety of roadway segments is likely to vary under different environ- 
mental conditions, driver reactions and other factors that may not necessar- 
ily be available to the analyst. For an illustration, consider Figure (H which 
shows weekly time series of the number of accidents on selected Indiana in- 
terstate segments during the 1995-1999 time interval. The figure shows that 
the number of accidents per week fluctuates widely over time. Thus, under 
different conditions, roads can become considerably more or less safe. As a re- 
sult, it is reasonable to assume that there exist two or more states of roadway 
safety. These states can help account for the existence of numerous uniden- 
tified and/or unobserved factors (unobserved heterogeneity) that influence 
roadway safety. Markov switching models are designed to account for unob- 



In fact, there may be more than two states but, for illustration purposes, the two- 
state case will be considered in this paper. Extending the approach to the possibility 
of additional states would significantly complicate the model structure. However, 
once this extension were done, additional states could be empirically tested. 
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Fig. 1. Weekly accident frequencies on the sample of Indiana interstate segments 
from 1995 to 1999 (the horizontal dashed line shows the average value). 

served states in a convenient and feasible way. In fact, these models have been 
successfully applied in other scientific fields. For example, two-state Markov 
switching autoregressive models have been used in economics, whe re the two 
state s are usually identified as economic rece ssion and expansion (IHamiltonl . 
19891 : iMcCuUoch and Tsavl . Il994l : IXsavl . l2002h . 



Another reason to expect the existence of multiple states is the empirical suc- 
cess of zero-inflated models, which are predicated on the existence of two-state 



process - a safe and an unsafe st ate (see lShankar et al.l . 119971 : ICarson and Mannering . 



200ll : lLee and Mannering) . l2002l ). Markov switching can be viewed as an exten- 
sion of previous work on zero- inflated models, in that it relaxes the assumption 



that a safe state exists (which has been brought up as a concern, see lLord et al 



( 120051 . 2007)) and instead considers two significantly different unsafe states. 
In addition, in contrast to zero-inflated models, Markov switching explicitly 
considers the possibility that roadway segments can change their state over 
time. 



2 Model specification 



Markov switching models are parametric and can be fully specified by a likeli- 
hood function /(Y|0, Ai), which is the conditional probability distribution of 
the vector of all observations Y, given the vector of all parameters of model 
Ai. In our study, we observe the number of accidents At^n that occur on the 
n^^ roadway segment during time period t. Thus Y = {v4f_„} includes all acci- 
dents observed on all roadway segments over all time periods {n = 1,2, Nt 
and t = 1,2, ... ,T). Model Ai = {M, Xj includes the model's name M 
(for example, M = "negative binomial") and the vector ^t,n of all roadway 
segment characteristic variables (segment length, curve characteristics, grades, 
pavement properties, and so on). 

To define the likelihood function, we first introduce an unobserved (latent) 
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state variable St, which determines the state of all roadway segments during 
time period t. At each t, the state variable St can assume only two values: St = 
corresponds to one state and = 1 corresponds to the other state. The state 
variable St is assumed to follow a stationary two-state Markov chain process 
in timejZI which can be specified by time-independent transition probabilities 
as 



P{st+i = l\st = 0) = po^i, P{st+i = 0\st=l)= pi^o- (1) 

Here, for example, P{st+i = = 0) is the conditional probability of St+i = 1 
at time t + 1, given that St = at time t. Note that P{st+i = 0\st = 0) = 
1 — po^i and P{st+i = l\st = 1) = 1 — pi^o- Transition probabilities po^i 
and pi-,0 are unknown parameters to be estimated from accident data. The 
stationary unconditional probabilities po and pi of states St = and St = 1 
ar^ 



Po = pi-^o/ {Po-^i + Pi->o) for state St = 0, 
Pi = Po^i/iPo^i +Pi^o) for state St = 1. 

Without loss of generality, we assume that (on average) state St = occurs 
more or equally frequently than state St = 1- Therefore, po > pi, and from 
Eqs. ([2]) we obtain restrictioE0 



Po^i < Pi~>o- (3) 

We refer to states St = and Si = 1 as "more frequent" and "less frequent" 
states respectively. 

Next, consider a two-state Markov switching negative binomial (MSNB) model 
that assumes negative binomial data-generating processes in each of the two 
states. With this, the probability of At^n accidents occurring on roadway seg- 
ment n during time period t is 



Markov property means that the probability distribution of s^+i depends only o n 
the value st at time t, but not on the previous history st-i, st-2, ■ ■ ■ ( Breiman . 19691 ). 



Stationarity of {st} is in the statistical sense. Below we will relax the assumption 
of stationarity and discuss a case of time-dependent transition probabilities. 
^ These can be found from stationarity condition s pg = ( 1 — po^i)po + Pi^oPi, 
Pi = Po^iPo + (1 - Pi->q)pi and po + pi = I (iBreimanl . Il969l ). 



^ Restriction ([3]) allows to avoid the problem of switching of state labels, 0^1. 
This problem would otherwise arise because of the symmetry of the likelihood func- 
tion ©-([I]) under the label switching. 
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T{l/at)At,J [l + atXt,n) ^l + aA,J ' 
_ . exp(/3'(o)Xf,„) if St = 
exp(/3(i)Xt,„) if st = l 



at ■■ 



0(0) if st = 
if = 1 

1,2,. ..,T, n=l,2,...,iV,. 



Here, Eg. (|11) is the stan dard negative binomial probability mass function 



( jWashington et al.l . l2003l ). r( ) is the gamma function, prime means trans- 
pose (so /3(Q) is the transpose of /3(o)), Nf is the number of roadway segments 
observed during time period t, and T is the total number of time periods. 
Parameter vectors /3(q) and /3(-^), and over- dispersion parameters a(o) > and 
> are the unknown estimable parameters of negative binomial models 
in the two states, St = and St = 1 respectively!!] We set the first component 
of ,1 to unity, and, therefore, the first components of /3(q-) and /3(]^) are the 
intercepts in the two states. 

If accident events are assumed to be independent, the likelihood function is 



T Nt 

/(Y|0,A^) = nnA^^- (6) 

f=l n=l 

Here, because the state variables St are unobservable, the vector of all es- 
timable parameters must include all states, in addition to all model param- 
eters (/3-s, a-s) and transition probabilities. Thus, 



® = [/3(o)5"{o),/3(i),Q;(i),Po->i,Pi-.o,S']', S' = [si, . . . , st]. (7) 

Vector S has length T and contains all state values. Eqs. ([I])-© define the 
two-state Markov switching negative binomial (MSNB) models considered in 
this study. 



° To ensure that a^o) and are non-negative, their logarithms are used in esti- 
mation. 
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3 Model estimation methods 



Statistical estimation of Markov switching models is complicated by unobserv- 
ability of the state variables As a result, the traditional maximum likeli- 
hood estimation (MLE) procedure is of very limited use for Markov switching 
models. Instead, a Bayesian inference approach is used. Given a model Ad 
with likelihood function /(Y|0, A^), the Bayes formula is 



HWIY M) - /C^-QI-^) - f{Y\@:M)n(@\M) 

^<°i''-^'--f(nM) //(Y,0|x)d0 ^ 

Here /(0|Y, Ai) is the posterior probability distribution of model parameters 
conditional on the observed data Y and model A4. Function f{Y,@\A4) 
is the joint probability distribution of Y and given model A4. Function 
f(Y\Ai) is the marginal likelihood function - the probability distribution of 
data Y given model A^. Function 7r(0|7Vl) is the prior probability distribu- 
tion of parameters that reflects prior knowledge about 0. The intuition behind 
Eq. ([HD is straightforward: given model A4, the posterior distribution accounts 
for both the observations Y and our prior knowledge of 0. We use the har- 
mon ic mean formula to calcu late the marginal likelihood /(Y|A^) of data Y 



see 



Kass and Rafteryl . Il995l ) as 



fiY\M)-' = J ^IyIq^;^! de = E [/(Y|0, Al)-^| y] , (9) 

where . . |Y) is the posterior expectation (which is calculated by using the 
posterior distribution). 

In our study (and in most practical studies), the direct application of Eq. ([8]) 
is not feasible because the parameter vector contains too many compo- 
nents, making integration over in Eq. ([8]) extremely difficult. However, the 
posterior distribution /(0|Y, A4) in Eq. ([S]) is known up to its normalization 
constant, /(0|Y,A^) oc f{Y\&,M)Tc{&\M). As a result, we use Markov 
Chain Monte Carlo (MCMC) simulations, which provide a convenient and 
practical computational methodology for sampling from a probability distri- 
bution known up to a constant (the posterior distribution in our case). Given 
a large enough posterior sample of parameter vector 0, any posterior expecta- 
tion and variance can be found and Bayesian inference can be readily applied. 
In the Appendix we describe our choice of prior distribution 7r(0|A1) and the 
MCMC simulations. The prior distribution is chosen to be wide and essentially 
noninformative. For the MCMC simulations in this paper, special numerical 



^ Below we will have 260 time periods (T = 260). In this case, there are 2^^*^ possible 
combinations for value of vector S = [si, . . . , st]'- 
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code was written in the MATLAB programming language and tested on arti- 
ficial accident data sets. The test procedure included a generation of artificial 
data with a known model. Then these data were used to estimate the under- 
lying model by means of our simulation code. With this procedure we found 
that the MSNB models, used to generate the artificial data, were reproduced 
successfully with our estimation code. 

For comparison of different models we use the following Bayesian approach. 
Let there be two models Aii and A^2 with parameter vectors ©i and ©2 
respectively. Assuming that we have equal preferences of these models, their 
prior probabilities are n{Aii) = it{M.2) = 1/2. In this case, the ratio of the 
models' posterior probabilities, P{Aii\Y) and P(A^2|Y), is equal to the Bayes 
factor. The later i s defin ed as the ratio of the models' marginal likelihoods 
(IKass and Raftervl . [l995l l Thus, we have 

P{M2\Y) f{M2,Y)/f{Y) f{Y\M2)n{M2) f{Y\M2) 
P{Mi\Y) f{Mi,Y)/f(Y) f(Y\Mi)n{M,) /(Y|A^i)' ^ ^ 

where f{A4i,Y) and f{Ai2,Y) are the joint distributions of the models and 
the data, /(Y) is the unconditional distribution of the data, and the marginal 
likelihoods fiY\Mi) and /(YjA^a) are given by Eq. ©. If the ratio in Eq. 
is larger than one, then model is favored, if the ratio is less than one, then 
model All is favored. An advantage of the use of Bayes factors is that it has an 
inherent penalty for including too many parameters in the model and guards 
against overfitting. 



4 Model estimation results 

Data are used from 5769 accidents that were observed on 335 interstate high- 
way segments in Indiana in 1995-1999. We use weekly time periods, t = 
1,2,3, ...,T = 260 in totalQ Thus, in the present study the state (st) is 
the same for all roadway segments and can change every week. Four types of 
accident frequency models are estimated: 

• First, we estimate a standard (single-state) negative binomial (NB) model 
without Markov switching by maximum likelihood estimation (MLE). We 
refer to this model as "NB-by-MLE" . 

• Second, we estimate the same standard negative binomial model by the 
Bayesian inference approach and the MCMC simulations. We refer to this 

^ A week is from Sunday to Saturday, there are 260 full weeks in the 1995-1999 time 
interval. We also considered daily time periods and obtained qualitatively similar 
results (not reported here). 
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model as "NB-by-MCMC" . As one expects, for our choice of a non-informative 
prior distribution, the estimated NB-by-MCMC model turned out to be very 
similar to the NB-by-MLE model. 

• Third, we estimate a restricted two-state Markov switching negative bino- 
mial (MSNB) model. In this restricted switching model only the intercept 
in the model parameters vector f3 and the over- dispersion parameter a are 
allowed to switch between the two states of roadway safety. In other words, 
in Eq. ([5]) only the first components of vectors /3(q-) and (3(^i-^ may differ, while 
the remaining components are restricted to be the same. In this case, the 
two states can have different average accident rates, given by Eq. but 
the rates have the same dependence on the explanatory variables. We refer 
to this model as "restricted MSNB" ; it is estimated by the Bayesian-MCMC 
methods. 

• Fourth, we estimate a full two-state Markov switching negative binomial 
(MSNB) model. In this model all estimable model parameters (/3-s and a) 
are allowed to switch between the two states of roadway safety. To obtain 
the final full MSNB model reported here, we consecutively construct and 
use 60%, 85% and 95% Bayesian credible intervals for evaluation of the 
statistical significance of each /5-parameter. As a result, in the final model 
some components of /3(q-) and are restricted to zero or restricted to be 
the same in the two states0 We do not impose any restrictions on over- 
dispersion parameters {as). We refer to the final full MSNB model as "full 
MSNB"; it is estimated by the Bayesian-MCMC methods. 

Note that the two states, and thus the MSNB models, do not have to exist. 
For example, they will not exist if all estimated model parameters turn out to 
be statistically the same in the two states, /3(o) = (which suggests the two 
states are identical and the MSNB models reduce to the standard NB model). 
Also, the two states will not exist if all estimated state variables St turn out 
to be close to zero, resulting in po^i ^ Pi^o (compare to Eq. ([3])), then the 
less frequent state = 1 is not realized and the process stays in state St = 0. 

The model estimation results for accident frequencies are given in Table [TJ 
Posterior (or MLE) estimates of all continuous model parameters (/3-s, a, 
Po^i and pi^o) are given together with the corresponding 95% confidence 
intervals for MLE models and 95% credible intervals for Bayesian-MCMC 
models (refer to the superscript and subscript numbers adjacent to parameter 
posterior/MLE estimates in Table [Tl) Table [2] gives summary statistics of all 

^ A /3-parameter is restricted to zero if it is statistically insignificaiit. A /3-parameter 
is restricted to be the same in the two states if the difference of its values in the 
two states is statistically insignificant. A (1 — a) credible interval is chosen in such 
way that the posterior probabilities of being below and above it are both equal to 
a/2 (we use significance levels a = 40%, 15%, 5%). 

^ Note that MLE estimation assumes asymptotic normality of the estimates, result- 
ing in confidence intervals being symmetric around the means (a 95% confidence 
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Fig. 2. The top plot is the same as Figured) The bottom plot shows weekly posterior 
probabilities P{st = 1|Y) for the full MSNB model. 

roadway segment characteristic variables Xt^„ (except the intercept). 

To visually see how the model tracks the data, consider Figure [H The top 
plot in Figure [2] shows the weekly accident frequencies in our data as given in 
Figure [H The bottom plot in Figure [2] shows corresponding weekly posterior 
probabilities P{st = 1|Y) of the less frequent state St = I for the full MSNB 
model. These probabilities are equal to the posterior expectations of St, P{st = 
1|Y) = 1 X P{st = 1|Y) + X P{st = 0|Y) = E{st\Y). Weekly values of 
P{st = 1|Y) for the restricted MSNB model are very similar to those given 
on the top plot in Figure [21 and, as a result, are not shown on a separate 
plot. Indeed, the time-correlation I between P{st = 1|Y) for the two MSNB 
models is about 99.5%. 



Turning to the estimation results, the findings show that two states exist and 
Markov switching models are strongly favored by the empirical data. In par- 
ticular, in the restricted MSNB model we over 99.9% confident that the differ- 



interval is ±1.96 standard deviations around the mean). In contrast, Bayesian esti- 
mation does not require this assumption, and posterior distributions of parameters 
and Bayesian credible intervals are usually non-symmetric. 

Here and below we calculate weighted correlation coefficients. For variable P{st = 
1|Y) = E(st\Y) we use weights wt inversely proportional to the posterior standard 
deviations of st- That is wt oc min {l/std(st |Y), median[l/std(st |Y)]}. 
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ence in values of /^-intercept in the two states is non-zero To compare the 
Markov switching models (restricted and full MSNB) with the corresponding 
standard non-switching model (NB), we calculate and use Bayes factors given 



by Eq. (ITU]) . We use Eq. and bootstrap simulations I for calculation of the 
values and the 95% confidence intervals of the logarithms of the marginal likeli- 
hoods given in Table [H The log-marginal-likelihoods are —16108.6, —15850.2 
and -15809.4 for the NB, restricted MSNB and full MSNB models respec- 
tively. Therefore, the restricted and full MSNB models provide considerable 
(258.4 and 299.2) improvements of the log-marginal-likelihoods of the data 
as compared to the corresponding standard non-switching NB model. Thus, 
given the accident data, the posterior probabilities of the restricted and full 
MSNB models are larger than the probability of the standard NB model by 
g258.4 g299.2 respectively. 

We can also use a classical statistics approach for model comparison, based 
on the maximum likelihood estimation (MLE). Referring to Tabled] the MLE 
gives the maximum log-likelihood value —16081.2 for the standard NB model. 
The maximum log-likelihood values observed during our MCMC simulations 
for the restricted and full MSNB models are —15786.6 and —15744.8 respec- 
tively. An imaginary MLE, at its convergence, would give MSNB log-likelihood 
values that were even larger than these observed values. Therefore, the MSNB 
models provide very large (at least 294.6 and 336.4) improvements in the max- 
imum log-likelihood value over the standard NB model. These improvements 
come with only modest increases in the number of free continuous model pa- 
rameters (/5-s and a-s) that enter the likelihood function. Both the Akaike 
Information Criterion (AIC) and the Bayesian Information Criterion (BIC^'^ 
strongly favor the MSNB models over the NB model. 



The difference of the intercept values is statistically non-zero despite the fact that 
the 95% credible intervals for these values overlap (see the "Intercept" line and the 
"Restricted MSNB" columns in Tabled]). The reason is that the posterior draws 
of the intercepts are correlated. The statistical test of whether the intercept values 
differ, must be based on evaluation of their difference. 

During bootstrap simulations we repeatedly draw, with replacement, posterior 
values of to calculate the posterior expectation in Eq. Q. In each of 10^ bootstrap 
draws that we make, the number of values drawn is 1/100 of the total number of 
all posterior values available from MCMC simulations. 

13 Minimization of AIC = 2K - 2LL and BIC = K \n{N) - 2LL ens ures an opti- 
mal choice of explanato ry variables in a model and avoids overfitting ( Tsay . 20021 : 
Washington et al.1 . l2003l l. Here K is the number of free continuous model parame- 



ters that enter the likelihood function, N is the number of observations and LL is 
the log-likelihood. When N > 8, BIC favors fewer free parameters than AIC does. 
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Table 1 



Estimation results for negative binomial models of accident frequency (the superscript and subscript numbers to the right of 
individual parameter posterior/MLE estimates are 95% confidence/credible intervals - see text for further explanation) 
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2.08?:|« 


2-o6?:i§ 


2.072:1° 


2.072:30 


2 072 -30 
^■•J' 1.84 


2 072-30 
■'•'J '1.84 


Posted speed limit (in mph) 


nir;4.0244 

.UIO* g(jg43 


m c;n 0241 


ni 61 -0251 
■^^"^.00697 


nici .0251 
.uioi Qog97 


nif!i.0252 
■'^^"■^.00712 


nicji .0252 
.UJ.UJ. Q0712 


Number of bridges per mile 


nr,, 0-. 00187 
•^^-^'^-.0407 


- 0241--"'^2i 

■'^^^-^-.0419 


p,9oo-. 00648 
.UZOO_ 0410 


r,9oq-. 00648 
.UZOO_ 0410 




-.06071 102^ 


Maximum of reciprocal values of horizontal curve radii (in 1 / mile) 


-.1821:122 


.1(,J_ 241 


-.178_:239 


-.178_:239 


- 175--"* 
■-^"^-.237 


-.i75::ij* 


Maximum of reciprocal values of vertical curve radii (in 1 /mile) 


00972 


m 77-027 


•0183:|jo9i7 


■0183:«2|f7 


•0184:|J2^4,, 


■oi84:!i274. 


Nuiiil>er of \x;rlica.i cui\'cs pci' mile 






-•"■"^•'-.0940 


.11249 


-•o^ii^-:;m7 


-■"•■'*'''-:09i7 


Percentage of single unit trucks (daily average) 






■'■•-'-^.701 


1 1q1.68 
'■■'-^.701 


.726ii2f 


^•^'1.77 



Table 1 



(Continued) 



Variable 


NB-by-MLE^ 


NB-by-MCMC ^ 


Restricted MSNB = 


Full MSNB-i 


state s = 


state s = 1 


state s = 


state 3 = 1 


Winter season (dummy) 


140.226 

.0698 


140.226 

.0689 


-iief^lli 


-.11610563^ 


- 159-0494 




Spring season (dummy) 


1~n-.0878 
— .LI 0_ 25S 


-.173I-««f 




-0932f5|59 






Summer season (dummy) 


_ 170- -('921 
— .266 


-.i8o::«3« 




-.0332^11/46 






Over-dispersion parameter a in NB models 


•^^ ' .845 


.968^,% 


coy. 677 


-^•^^.986 


440.595 




Mean accident rate (At,n for NB), averaged over all values of Xt,n 




.0663 


.0558 


.1440 


.0533 


.1130 


Standard deviation of accident rate ("^^ n(l ocXt n) for NB), 
averaged over all values of explanatory variables ^t,n 




.2050 


.1810 


.3350 


.1760 


.2820 


Markov transition probability of jump — ^ 1 (po^l) 






.0933:igi 


1 58-225 
•13°. 100 


Markov transition probability of jump 1 — (pi^o) 






•65i:!i§ 


fi27-^''3 
•0^'.474 


Unconditional probabilities of states and 1 (po and pi) 






.873:91? and .127:^03^ 


.798:S6| and .202:282 


Total number of free model parameters (/3-s and as) 


26 


26 


28 


28 


Posterior average of the log-likelihood (LL) 




— IfinQ? 9-16091-3 


-15821.SZ\ll°li 




Max(LL): estimated max. value of log-likelihood (LL) for MLE; 
max. observed LL during MCMC simulations for Bayesian estim. 


-16081.2 (MLE) 


-16086.3 (observ.) 


-15786.6 (observed) 


-15744.8 (observed) 


Logarithm of marginal likelihood of data (ln[/(Y|jV1)]) 






1 c;ai;n O-16840.1 
-15850.2_^5g49 5 


-15809.4lJ585;-7 


Goodness-of-fit p-value 




0.701 


0.729 


0.647 


Maximum of the potential scale reduction factors (PSRF) ^ 




1.00874 


1.00754 


1.00939 


Multivariate potential scale reduction factor (MPSRF) ^ 




1.00928 


1.00925 


1.01002 



* Standard (conventional) negative binomial estimated by maximum likelihood estimation (MLE). 



^ Standard negative binomial estimated by Markov Chain Monte Carlo (MCMC) simulations. 

Restricted two-state Markov s-witching negative binomial (MSNB) model with only the intercept and over-dispersion parameters allowed to vary between states. 
■1 Full two-state Markov switching negative binomial (MSNB) model with all parameters allowed to vary between states. 
^ The pavement quality index (PQI) is a composite measure of overall pavement quality evaluated on a to 100 scale. 

^ PSRF/MPSRF are calculated separately /jointly for all continuous model parameters. PSRF and MPSRF are close to 1 for converged MCMC chains. 



Tabic 2 



Summary statistics of roadway segment characteristic variables 



Variable 


Mean 


Standard deviation 


Minimum 


Median 


Maximum 


Accident occurring on interstates 1-70 or 1-164 (dummy) 


.155 


.363 








1.00 


Pavement quality index (PQI) average ^ 


88.6 


5.96 


69.0 


90.3 


98.5 


Road segment length (in miles) 


.886 


1.48 


.00900 


.356 


11.5 


Logarithm of road segment length (in miles) 


-.901 


1.22 


-4.71 


-1.03 


2.44 


Total number of ramps on the road viewing and opposite sides 


.725 


1.79 
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Number of ramps on the viewing side per lane per mile 


.138 


.408 








3.27 


Median configuration is depressed (dummy) 


.630 


.484 





1.00 


1.00 


Median barrier presence (dummy) 


.161 


.368 








1 


Interior shoulder presence (dummy) 


.928 


.258 





1 


1 


Width of the interior shoulder is less that 5 feet (dummy) 


.696 


.461 





1.00 


1.00 


Interior rumble strips presence (dummy) 


.722 


.448 





1.00 


1.00 


Width of the outside shoulder is less that 12 feet (dummy) 


.752 


.432 





1.00 


1.00 


Outside barrier absence (dummy) 


.830 


.376 





1.00 


1.00 


Average annual daily traffic (AADT) 


3.03 X 10** 


2.89 X 10* 


.944 X 10** 


1.65 X lO'* 


14.3 X 10* 


Logarithm of average annual daily traffic 


10.0 


.623 


9.15 


9.71 


11.9 


Posted speed limit (in mph) 


63.1 


3.89 


50.0 


65.0 


65.0 


Number of bridges per mile 


1.76 


8.14 








124 


Maximum of reciprocal values of horizontal curve radii (in 1/mile) 


.650 


.632 





.589 


2.26 


Maximum of reciprocal values of vertical curve radii (in 1/mile) 


2.38 


3.59 








14.9 


Number of vertical curves per mile 


1.50 


4.03 








50.0 


Percentage of single unit trucks (daily average) 


.0859 


.0678 


.00975 


.0683 


.322 


Winter season (dummy) 


.242 


.428 








1.00 


Spring season (dummy) 


.254 


.435 








1.00 


Summer season (dummy) 


.254 


.435 








1.00 



* The pavement quality index (PQI) is a composite measure of overall pavement quality evaluated on a to 100 scale. 



Focusing on the full MSNB model, which is statistically superior, its estimation 
results show that the less frequent state St = 1 is about four times the 
more frequent state St = (refer to the estimated values of the unconditional 
probabilities pQ and pi of states and 1, which are given in the "Full MSNB" 
columns in Tabled]). 

Also, the findings show that the less frequent state St = 1 is considerably less 
safe than the more frequent state Sj = 0. This result follows from the values of 
the mean weekly accident rate Xt^n [given by Eq. with model parameters f3- 
s set to their posterior means in the two states] , averaged over all values of the 
explanatory variables Xj_„ observed in the data sample (see "mean accident 
rate" in Table [1]). For the full MSNB model, on average, state = 1 has 



about two times more accidents per week than state St = hasliU Therefore, 
it is not a surprise, that in Figure [2] the weekly number of accidents (shown 
on the bottom plot) is larger when the posterior probability P{st = 1|Y) 
of the state = 1 (shown on the top plot) is higher. Note that the long- 
term unconditional mean of the accident rates is equal to the average of the 
mean accident rate over the two states, this average is calculated by using 
the stationary probabilities po and pi given by Eq. ([2]) (see the "unconditional 
probabilities of states and 1" in Tabled]). 

It is also noteworthy that the number of accidents is more volatile in the 
less frequent and less-safe state {st = 1). This is reflected in the fact that 
the standard deviation of the accident rate (stdt^„ = Xt^ni^ + aAt^n) for NB 
distribution), averaged over all values of explanatory variables ^t,n, is higher 
in state St = 1 than in state St = (refer to Table d])- Moreover, for the 
full MSNB model the over-dispersion parameter a is higher in state St = 1 
{a = 0.443 in state St = and a = 1.16 in state St = 1). Because state 
St = 1 is relatively rare, this suggests that over-dispersed volatility of accident 
frequencies, which is often observed in empirical data, could be in part due 
to the latent switching between the states, and in part due to high accident 
volatility in the less frequent and less safe state = 1. 

To study the effect of weather (which is usually unobserved heterogeneity in 
most data bases) on states. Table [3] gives time-correlation coefficients between 
posterior probabilities P{st = 1|Y) for the full MSNB model and weather- 
condition variables. These correlations were found by using daily and hourly 
historical weather data in Indiana, available at the Indiana State Climate 



Note that accident frequency rates can easily be converted from one time period 
to another (for example, weekly rates can be converted to annual rates). Because 
accident events are independent, the conversion is done by a summation of moment- 
generating (or characteristic) functions. The sum of Poisson variates is Poisson. The 
sum of NB variates is also NB if all explanatory variables do not depend on time 
0^t,n = X„). 
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Table 3 

Correlations of the posterior probabilities P{st = 1|Y) with weather-condition vari- 



ables for the full MSNB model 





All year 


Winter 

(November-March) 


Summer 

(May-September) 


Precipitation (inch) 


0.031 





0.144 


Temperature (°F) 


-0.518 


-0.591 


0.201 


Snowfall (inch) 

> 0.2 (dummy) 


0.602 


0.577 




0.651 


0.638 




Fog / Frost (dummy) 


0.223 


(frost) 0.539 


(fog) 0.051 


Visibility distance (mile) 


-0.221 


-0.232 


-0.126 



Office at Purdue University (www.agry.purdue.edu/climate). For these corre- 
lations, the precipitation and snowfall amounts are daily amounts in inches 
averaged over the week and across several weather observation stations that 



are located close to the roadway segments] I The temperature variable is 
the mean daily air temperature {°F) averaged over the week and across the 
weather stations. The effect of fog/frost is captured by a dummy variable that 
is equal to one if and only if the difference between air and dewpoint tempera- 
tures does not exceed 5°F (in this case frost can form if the dewpoint is below 
the freezing point 32° F, and fog can form otherwise). The fog/frost dummies 
are calculated for every hour and are averaged over the week and across the 
weather stations. Finally, visibility distance variable is the harmonic mean of 
hourly visibility distances, which are measured in miles every hour and are 
averaged over the week and across the weather stations 
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Table [3] shows that the less frequent and less safe state = 1 is positively 
correlated with extreme temperatures (low during winter and high during 
summer), rain precipitations and snowfalls, fogs and frosts, low visibility dis- 
tances. It is reasonable to expect that during bad weather, roads can become 
significantly less safe, resulting in a change of the state of roadway safety. As a 
useful test of the switching between the two states, all weather variables, listed 
in Table [3], were added into our full MSNB model. However, when doing this, 
the two states did not disappear and the posterior probabilities P{st = 1|Y) 
did not changed substantially (the correlation between the new and the old 
probabilities was around 90%). 



Snowfall and precipitation amounts are weakly related with each other because 
snow density {g/crrfi) can vary by more than a factor of ten. 

The harmonic mean d of distances d„ is calculated as = / N) J2n=i '^n^ i 
assuming dn = 0.25 miles if dn < 0.25 miles. 
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In addition to the MSNB models, we estimated two-state Markov switching 
Poisson (MSP) models, which have the Poisson likelihood function instead of 
the NB likelihood function in Eq. (jl]). Our findings for the M SP models are 
very similar to those for the MSNB models ( Malyshkina . 2008I ). Also, because 
the time series in Figure [2] seems to exhibit a seasonal pattern (roads appear 
to be less safe and P{st = 1|Y) appears to be higher during winters), we esti- 
mated MSNB and MSP models in which the transition probabilities po^i and 
Pi^o a-i^e not constant (allowing each of them to assume two different values: 
one during winters and the other during all remaining seasons). However, these 
models did not perform as well as the MSNB and MSP models with constant 
transition probabilities [as judged by the Bayes factors, see Eq. (fTOj) ] 



5 Summary and conclusions 



The empirical finding that two states exist and that these states are correlated 
with weather conditions has important implications. The findings suggest that 
multiple states of roadway safety can exist due to slow and/or inadequate 
adjustment by drivers (and possibly by roadway maintenance services) to ad- 
verse conditions and other unpredictable, unidentified, and/or unobservable 
variables that infiuence roadway safety. All these variables are likely to interact 
and change over time, resulting in transitions from one state to the next. 

As discussed earlier, the empirical findings show that the less frequent state 
is significantly less safe than the other, more frequent state. The full MSNB 
model results show that explanatory variables „, other than the intercept, 
exert different influences on roadway safety in different states as indicated by 
the fact that some of the parameter estimates for the two states of the full 



MSNB model are signiflcantly different! I Thus, the states not only differ by 



average accident frequencies, but also differ in the magnitude and/or direction 
of the effects that various variables exert on accident frequencies. This again 
underscores the importance of the two-state approach. 

The Markov switching models presented in this study are similar to zero- 
inflated count data models (which have been previously applied in accident 
frequency research) in the sense that they are also two-state models (see 



We have only five winter periods in our five-year data. MSNB and MSP with 
seasonally changing transition probabilities could perform better for an accident 
data that covers a longer time interval. 

Table [1] shows that parameter estimates for pavement quality index, total number 
of ramps on the road viewing and opposite sides, average annual daily traffic, number 
of bridges per mile, and percentage of single unit trucks are all significantly different 
between the two states. 
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Shankar et al.l . 119971 : iLord et al.l . 120071 ). However, in contrast to zero- inflated 



models, the models presented herein allow for switching between the two states 
over time. Also, in this study, a "safe" state is not assumed and accident fre- 
quencies can be nonzero in both states. 

In terms of future work on Markov switching models for roadway safety, ad- 
ditional empirical studies (for other accident data samples), and multi-state 
models (with more than two states of roadway safety) are two areas that would 
further demonstrate the potential of the approach. 
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Appendix: MCMC simulation algorithm 



For brevity, in this appendix we omit model notation Ai in all equations. For 
example, here we write the posterior distribution, given by Eq. ([H]), as /(©|Y). 

We choose the prior distribution 7r(0) of parameter vector 0, given by Eq. ([7]), 

as 



7i"(0) = /(Sbo^i,Pi^o)7r(po^i,pi^o) n 



s=0 



7i-(a(s))n^(A«)>* 



T T 

/(Sbo^i,Pi^o) = P{sr)X{P{st\st-i) ^X{P{st\st_,) 



t=2 



t=2 



(A.l) 
(A.2) 



(po^i)'"°-Hi -P^^ir^'{Pi-^or'-'{i-Pi^or-' (a.3) 



Here (3<^s),k is the k component of vector fif^^y The priors of and 

are chosen to be normal distributions, vr(a;(s)) = J\f{fJ,a,'^a) and 7i{P(^s),k) = 
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Af{fik,'^k)- The joint prior of po^i and pi^o is given by Eq. (lA.2p . where 
7r(po^i) = Beta{vo, uq) and 7r(pi^o) = Beta{vi, z/i) are beta distributions, and 
function /(po^i < Pi^o) is equal to unity if the restriction given by Eq. is 
satisfied and to zero otherwise. We disregard distribution P(si) in Eq. (1A.3P 
because its contribution is neghgible when T is large (alternatively, we can 
assume P{si = 0) = P(si = 1) = 1/2). Number rii^j is the total number of 
state transitions i —>■ j (where i,j = 0,1). Parameters that enter the prior 
distributions are called hyper-parameters. For these, the means Ha and /i^ 
are equal to the maximum likelihood estimation (MLE) values of a and jSk 
for the corresponding standard single-state model (NB model in this study). 
The variances and are ten times larger than the maximum between 
the MLE values of a and f3k squared and the MLE variances of a and f3k for 
the standard model. We choose t;o = t'o = "^i = ^^i = 1 (in this case the beta 
distributions become the uniform distribution between zero and one). 



To obtain draws from a posterior distribution, we use the hybrid Gibbs sam- 
pler, which is an MCMC sim ulation algorithm tha t invo l ves both G i bbs and 



Metr opolis-Hasting sampling (IMcCulloch and Tsayl . ll994lTsayl . l2002l : ISAS Institute Inc. 
20061 ). Assume that is composed of K components: © = [0[,62, . . . , O'j^]' , 
where 6^ can be scalars or vectors, k = 1,2, . . . , K. Then, the hybrid Gibbs 
sampler works as follows: 



(1) Choose an arbitrary initial value of the parameter vector, = 0^°^ , 
such that /(Y, 0(o)) > 0. 

(2) For each g = 1,2,3, ... , parameter vector 0'-^) is generated component- 
by-component from 0(9"^) by the following procedure: 

(a) First, draw 6i'^ from the conditional posterior probability distribu- 
tion f{0'f^ |Y, 6^2~^\ • • • ! ^j?"^'')- If this distribution is exactly known 
in a closed analytical form, then we draw O'f^ directly from it. This 
is Gibbs sampling. If the conditional posterior distribution is known 
up to an unknown normalization constant, then we draw fli^'* by us- 
ing the Metropolis-Hasting (M-H) algorithm described below. This 
is M-H sampling. 

(b) Second, for all = 2, 3, . . . , K — 1, draw O'f^ from the conditional pos- 
terior distribution f{e^^^ I Y, 6>^^\ . . . , e^^\, e^^^^\ e%-^^) by us- 
ing either Gibbs sampling (if the distribution is known exactly) or 
M-H sampling (if the distribution is known up to a constant). 

(c) Finally, draw 0^'' from conditional posterior probability distribution 
f{0^^\Y, 6^i \ . . . , ^x-i) tiy using either Gibbs or M-H sampling. 

(3) The resulting Markov chain {0^^^} converges to the true posterior dis- 
tribution /(0|Y) as g oo. 



Note that all conditional posterior distributions are proportional to the joint 
distribution /(Y, 0) = /(Y|0)7r(0), where the likelihood /(Y|0) is given 
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by Eq. and the prior 7r(0) is given by Eq. (lA.ip . 



By using the hybrid Gibbs sampler algorithm described above, we obtain a 
Markov chain {@^^^, where g = 1, 2, ... , Gbi, Gu + 1, . . . , G. We discard the 
first Gu "burn-in" draws because they can depend on the initial choice 0*^°\ 
Of the remaining G — Gu draws, we typically store every third or every tenth 
draw in the computer memory. We use these draws for Bayesian inference. 
Our typical choice is Gb, = 3 x 10^ and G = 3 x 10^. In our study, one MCMC 
simulation run takes few days on a single computer CPU. We usually con- 
sider eight choices of the initial parameter vector Thus, we obtain eight 
Markov chains of 0, and use them for the Brooks-Gelman-Rub i n dia gnos- 



tic of convergence of our MCMC simulations (iBrooks and Gelmaru . Il998l ). We 
also check convergence by monitoring the likelihood /(Y|0(^'') and the joint 
distribution /(Y,0(3)). 

The Metropolis-Hasting (M-H) algorithm is used to sample from conditional 



posterior distributions known up to their normalization constants! I For brevity, 
we use notation Y, 0\0,) = f{e^\Y,e^f\...,ef_,,e'i-^\...,dt^^), 

where @\6k means all components of except 6^. The M-H algorithm is as 
follows: 

• Choose a jumping probability distribution J{6k\0k) of Ok- It must stay the 
same for all draws g = Gbi + I, ■ ■ ■ ,G, and we discuss its choice below. 

• Draw a candidate dk from J{6k\6^jf^^^). 



Calculate ratio 
Set 



P = X J^ . (A.4) 



<g) _ I Ok with probability min(p, 1), 
I Q]: otherwise. 

Note that the unknown normalization constant of fg{. . .) cancels out in Eq. (IA.4p . 

In this study is given by Eq. ([7]), and the hybrid Gibbs sampler generates 
draws 0''^'' from Q^s-^) follows (for brevity, below we drop g indexing): 

(a) We draw vector /3(o) component-by-component by using the M-H algo- 
rithm. For each component /5(o),fc of /3(o) we use a normal jumping dis- 
tribution J(/3(o),fc|/3(o),fc) = A/'(/3(o),fc,o-(^o),fc)- Variances af^^ f^ are adjusted 



In general, the M-H algorithm allows to make draws from any probability distri- 
bution known up to a constant. The algorithm converges as the number of draws 
goes to infinity. 
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during the burn-in sampling {g = 1,2, ... ,Gbi) to have approximately 



30% acceptance rate in Eq. (lA.Spl The conditional posterior distribu- 
tion of /3(o),fe is 

/(/5(o),fc|Y,0\/3(o),fc) oc /(Y,0) = /(Y|0)7r(0) cx /(Y|0)7r(/?(o),fc). 

(b) We draw a^o) first, all components of /3(i) second, and «(!) third, from 
their conditional posterior distributions by using the M-H algorithm in a 
way very similar to the drawing the components of /3(q). In all cases, we 
use normal jumping distributions with variances chosen to have ~ 30% 
acceptance rates. 

(c) By using Gibbs sampling, we draw, first, po^i and, second, pi^o from 
their conditional posterior distributions, which are truncated beta distri- 
butions, 

/(j9o^i|Y, 0\j9o^i) oc /(Y, 0) oc /(S|]9o^i,pi^o)7r(po^i,j9i^o) oc 

oc Beta{vo + ^o-^i, z/q + no^o)^(Po^i < Pi^o), (^-6) 
/(pi^o|Y, 0\pi^o) oc Beta{vi + ni^o, i^i + ni^i)/(po^i < pi^o)- 

(d) Finally, we draw components of S = [si, S2, . . . , st]' by Gibbs sampling. 
Neighboring components of S can be strongly (anti-)correlated. There- 
fore, to speed up MCMC convergence in this case, we draw subsections 
St,T = [st, St+i, . . . , St+T-i]' of S at a time. The conditional posterior dis- 
tribution of St^T is 

/(V|Y,0W) oc /(Y,0) oc /(Y|0)/(Sbo^i,pi^o)- (A.7) 

Vector St^r has length r and can assume 2'^ possible values. By choosing 
r small enough, we can compute the right-hand-side of Eq. flA.7p for each 
of these values and find the normalization constant of /(Sf,T-|Y, 0\St^T-). 
This allows us to make Gibbs sampling of St^r- Our typical choice of r is 
from 10 to 14. We draw all subsections St^r one after another. 

Additional details on MCMC simulation algorithms and their implemen tation 



in the context of accident modeling can be found in iMalyshkinal (120081 ). 
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We also tried Cauchy jumping distributions and obtained similar results. 
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